How to make your scientific code accessible and reproducible?

Using Git and Github

Steve Vissault

inSileco

April 15, 2025

Who am I?

Who am I?

  • Steve Vissault, M.Sc in forest ecology
    • How climate changes will shape forest distributions at the end of this century
  • Research professional at UdS for 3 years
  • Then I did software developement for Omnimed, an Electronic Medical Records (EMR) for 3 years
  • Started a consulting company inSileco 4 years ago with 2 amazing ecologists
  • Now, taking a position at INRS with Valérie Langlois for the next 2 years

Workshop 1
What is Git, why and how to use it?

What is Git?

  • Git is a version control system: a software that allows you to track additions and modifications for a set of files within a folder, called a repository
  • Its original purpose was to help groups of developers work collaboratively on big software projects.
  • Think git as the “Track Changes” features from Microsoft Word on steroids.

Git - Context

Compiled software are a bunch of files. At each software improvment, developers want the software to be stable with no regression. For that, they have to know what files have changed between snapshots.

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Git fondations

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: branches

Git fondations: branches

  • Une branche (main par défault): c’est un série de commentaires (commit)
  • Le dernier commentaire (commit) est ce que l’on appelle la tête de la branche (HEAD), elle contient la version la plus à jour des fichiers.
  • À chaque commentaire d’édition (commit) est attachée une version des fichiers.

Git fondations: local vs remote

How git helps in science reproducibility

  • Git is a coding notebook: it documents not only the code but also the coding process
  • It makes your code findable and reusable by publishing on a remote repository Github
  • All the code source, data and figs are accessible from one location

In summary

  1. Initiate the repository (git init)
  2. Edit your files
  3. Stage the modifications to be committed (git add)
  4. Create a new commit object (git commit)
  5. Go back to step 2.
  6. Publish your code on the remote server (git fetch & git push)
  7. Go back from vacation
  8. Watch if another developer has release new version of the code on the remote repository (git fetch & git pull)

How to use Git in practice?

Prerequisites

  1. Make sure R version ≥ 4.0 is installed on your computer
  2. Install Git:
  • Windows, Install Git for Windows (also known as msysgit or Git Bash) to get Git and tools like the Bash shell. Download the executable here
  • macOS, open a terminal and type:
    • git —version
    • git config
    • If Git isn’t installed, macOS will prompt you to install it.
  • Linux, nothing to do—Git is likely pre-installed.

For more details: https://happygitwithr.com/install-git

What we’re going to do

  1. Create a folder
  2. Initiate a git repo for this new folder
  3. Create a new R script (file 1) and edit it
  4. Stage this new R script in the git repo
  5. Add a commit to document the added file
  6. Create a second R script (file 2) in the git repo and edit (file 3)
  7. Repeat steps 4 and 5 for this second script
  8. Check the git log
  9. Get back to the first commit

Initiate a Git repo

In order to reduce the learning curve, we will learn the R Studio way of interacting with git

So many other ways to interact with git:

  • Using R with the gert package
  • Using the terminal
  • Using as GUI such as Github Desktop

All the key concepts and process still the same!

Initiate a Git repo

To use git in Rstudio, you need to create a new Project. Let’s start by creating our project directory.

Open RStudio, in the top right corner create a new project

  • Select New Directory Note that Version Control is reserved for when you already created a repo on GitHub.
  • Select New Projects
  • Choose a name for your repo and a location. Make sure that create a git repository is checked

Initiate a Git repo

You can now look under the git tab, you should see this:

Create a new R script

Create a new file: go to the menu File => New file => R Script. Make sure to save it with the name script_1.R.

What happens in the git tab?

Understand file status

  • Files that are untracked are represented by a yellow question mark.
  • Files that have been added (see next section) are represented by a green A
  • Files that have been tracked and modified are represented by a blue M.
  • Files that are tracked but not modified do not show.
  • Files that have been deleted are shown with a red D.

Stage a new R script

This is as simple as “checking off” the file in the Git tab. This is called Staging the file.

What is a gitignore? A gitignore is a file that lists the file you never want git to track. It can match certain file names (for instance, .csv or .tif files). This can be useful in case you need to make sure certain files (like data files or large files), do not get added.

Create a commit

Click on commit on the top of the file list. This window should appear:

Before committing anything you need to add a commit message. It is important to add a useful message to your commit, a bit like a journal entry, so that you can remember what you committed.

Click on commit in order to commit the changes!

Edit your script

  • Note that once you have staged a file, you could do more changes, and you would need to re-run git add to add them to the index. Those changes not yet fully registered by git, they are like a draft, not until you commit.
  • When you want to take a snapshot of a file, it means you are ready to commit that change to the index.

Workshop 2
What is Github and how to use it to collaborate?

Review of Workshop 1

In the first workshop, we learned the basics of Git. We covered:

  1. How to create a Git repository from an RStudio project
  2. How to stage a file or a group of files
  3. How to make make a commit
  4. View the history of all commits
  5. Tell git to ignore files with a .gitignore file (relevant for sensitive data)

git Workflow Overview Reminder

The typical workflow:

  • Step 1: You modify files in your working directory and save them as usual.
  • Step 2: You stage files to mark your intention to “commit” them (in RStudio this is done by checking the box next to the file in the “Git” tab).
  • Step 3: You commit the staged files, which permanently stores them as snapshots to your git directory.

Tip

We can make an analogy with taking a family picture, where each family member would represent a file.

Staging files is like deciding which family member(s) are going to be on your next picture Committing is like taking the picture

Git Glossary from Workshop 1

  • Git repository: A project folder that Git tracks. It contains all the files and a hidden .git folder where Git stores the complete history of changes.
  • Staged file(s) : Files that are ready to be committed
  • Unstaged file(s) : Files that are not ready to be committed
  • Untracked file(s) : Files that are not tracked by Git
  • Commit: A saved version of your project at a specific point in time (also called snapshot), including a message that explains what changed.

In this Workshop

  • How Git and GitHub work together
  • How to synchronize your local repository:
    • Push your commits to GitHub
    • Pull your collaborators’ commits from Github
  • How conflicts are generated and how to resolve them when applying your collaborators’ commits

What is Github?

What is Github?

  • Github is a web-based platform.
  • Github stored your Git repository on a remote server.
  • Github allows you to backup and make your work foundable and accessible to others.

How to connect your local repository to Github?

Étape 1. Pour ceux qui n’ont pas de compte, s’inscrire sur Github en suivant le lien

How to connect your local repository to Github?

Étape 2. Configurer ses credentials pour que Github nous reconnaisse

We have to store a token in our local system. This token is used to authenticate your computer with Github.

Tip

Token = password

How to connect your local repository to Github?

Étape 2. Configurer ses credentials pour que Github nous reconnaisse

Run the following command in RStudio

usethis::create_github_token()
  1. This command will open your browser at Github page, log in
  2. Name explictly the token (e.g based on your current R project), copy the token starting with ghp_

How to connect your local repository to Github?

Étape 2. Configurer ses credentials pour que Github nous reconnaisse

Run the following command in RStudio

gitcreds::gitcreds_set()

This command will open a prompt in RStudio

Paste the token you copied in the previous step

How to connect your local repository to Github?

Étape 2. Configurer ses credentials pour que Github nous reconnaisse

Assess if everything is fine and the credentials are OK

usethis::git_sitrep()

Note

This way, you will not have to enter your credentials every time you push or pull from GitHub. If you want to know more on this authentification process, have a look at the usethis documentation

How to connect your local repository to Github?

Étape 3. Connect your local repository to Github

usethis::use_github()

Note

usethis::use_github(private = TRUE) if you want your repository to be private, because you are working with sensitive data

The configuration is done and the local is now sync! Lets have a quick look at the Github repo on the website.

Quick overview of the Github repo website

  • Red: List of files
  • Blue: Latest commit
  • Green: when the files and directories were last modified

Quick overview of the Github repo website

Quick overview of the Github repo website

How to synchronize with the remote (Github)?

How to synchronize with the remote (Github)?

Lets go back on RStudio and check the Git tab. Two new buttons Pull and Push are now available.

How to synchronize with the remote (Github)?

  • Step 1: Edit your files
  • Step 2: Commit, move a changed local file to your local staging area
  • Step 3: Pull, get file(s) from the cloud to your local computer – opposite of a “push”
  • Step 4: Push, move file(s) to the cloud from your local computer – opposite of a “pull”

You have to repeat step 2-4, every time you edit, add, delete a new file if you want to keep your local and remote repository in sync.

Let’s practice

  • Create a new R script (file 2) and edit it
  • Stage this new R script in the git repo
  • Add a commit to document the added file
  • Create a second R script (file 3) in the git repo and edit it
  • Repeat steps 4 and 5 for this second script
  • Check the git log
  • Publish the new files on the remote server (Github)